A novel Multi-Layer Attention Framework for visual description prediction using bidirectional LSTM
نویسندگان
چکیده
Abstract The massive influx of text, images, and videos to the internet has recently increased challenge computer vision-based tasks in big data. Integrating visual data with natural language generate video explanations been a for decades. However, recent experiments on image/video captioning that employ Long-Short-Term-Memory (LSTM) have piqued interest researchers studying its possible application captioning. proposed architecture combines bidirectional multilayer LSTM (BiLSTM) encoder unidirectional decoder. innovative also considers temporal relations when creating superior global representations. In contrast majority prior work, most relevant features are selected utilized specifically purposes. Existing methods utilize single-layer attention mechanism linking input phrase meaning. This approach employs LSTMs extract characteristics from movies, construct links between multi-modal (words material) representations, sentences rich semantic coherence. addition, we evaluated performance suggested system using benchmark dataset obtained results reveal relative state-of-the-art works METEOR promising BLEU score. terms quantitative performance, outperforms existing methodologies.
منابع مشابه
Disfluency Detection Using a Bidirectional LSTM
We introduce a new approach for disfluency detection using a Bidirectional Long-Short Term Memory neural network (BLSTM). In addition to the word sequence, the model takes as input pattern match features that were developed to reduce sensitivity to vocabulary size in training, which lead to improved performance over the word sequence alone. The BLSTM takes advantage of explicit repair states in...
متن کاملWord Sense Disambiguation using a Bidirectional LSTM
In this paper we present a model that leverages a bidirectional long short-term memory network to learn word sense disambiguation directly from data. The approach is end-to-end trainable and makes effective use of word order. Further, to improve the robustness of the model we introduce dropword, a regularization technique that randomly removes words from the text. The model is evaluated on two ...
متن کاملA Novel Design of a Multi-layer 2:4 Decoder using Quantum- Dot Cellular Automata
The quantum-dot cellular automata (QCA) is considered as an alternative tocomplementary metal oxide semiconductor (CMOS) technology based on physicalphenomena like Coulomb interaction to overcome the physical limitations of thistechnology. The decoder is one of the important components in digital circuits, whichcan be used in more comprehensive circuits such as full adde...
متن کاملSoft + Hardwired Attention: An LSTM Framework for Human Trajectory Prediction and Abnormal Event Detection
As humans we possess an intuitive ability for navigation which we master through years of practice; however existing approaches to model this trait for diverse tasks including monitoring pedestrian flow and detecting abnormal events have been limited by using a variety of hand-crafted features. Recent research in the area of deeplearning has demonstrated the power of learning features directly ...
متن کاملLearning Natural Language Inference using Bidirectional LSTM model and Inner-Attention
In this paper, we proposed a sentence encoding-based model for recognizing text entailment. In our approach, the encoding of sentence is a two-stage process. Firstly, average pooling was used over word-level bidirectional LSTM (biLSTM) to generate a firststage sentence representation. Secondly, attention mechanism was employed to replace average pooling on the same sentence for better represent...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of Big Data
سال: 2022
ISSN: ['2196-1115']
DOI: https://doi.org/10.1186/s40537-022-00664-6